What can we learn about the selection mechanism for post-editing?

نویسندگان

  • Maja Popović
  • Eleftherios Avramidis
  • Aljoscha Burchardt
  • David Vilar
  • Hans Uszkoreit
چکیده

Post-editing is an increasingly common form of human-machine cooperation for translation. One possible support for the post-editing task is offering several machine outputs to a human translator from which then can choose the most suitable one. This paper investigates the selection process for such method to get a better insight into it so that it can be optimally automatised in future work. Experiments show that only about 70% of the selected sentences are the best ranked ones, and that selection mechanism is tightly related to edit distance. Furthermore, five types of performed edit operations are analysed: correcting word form, reordering, adding missing words, deleting extra words and correcting lexical choice. 1 Motivation and related work Machine translation (MT) has improved considerably in recent years thus gaining recognition in the translation industry. However, machine translation outputs have not yet reached the same quality as human translations. Performing the post-editing has become a common practice for improving machine translation outputs. Therefore, more and more attention is paid to various aspects of postediting, such as (Specia, 2011). Prediction of errors in rule-based system outputs has been investigated in (Valotkaite and Asadullah, 2012) in order to facilitate the post-editing process. Analysis of edit operations has been carried out in (Koponen, 2012) in order to understand discrepances between edit distance and translation quality (i.e. predicted post-editing effort). Our work explores the selection criteria applied by professional translators when several translation outputs of each source sentence are offered for post-editing. The scenario is similar to the one in (He et al., 2010), but our approach goes beyond, since they consider only two outputs (one produced by statistical machine translation system and other by translation memory), they do not examine ranking of these outputs, they have not tested their automatic method by professional translators, and they do not analyse edit distances and the performed edit operations. Our main questions are: • Is the translation output which is best for postediting also the best ranked one? • Is the edit distance of the chosen output lower than edit distances of the other outputs? • Are there some (less) preferred edit operations? and to the best of our knowledge they have not been investigated yet. 2 Experimental setup The translation outputs investigated in this work are produced by German-English, German-French and German-Spanish machine translation systems in both directions. The test sets consist of three domains: news texts taken from WMT tasks (Callison-Burch et al., 2010), technical documentation extracted from the freely available OpenOffice project (Tiedemann, 2009) and client data owned by project partners. The number of Sharon O’Brien, Michel Simard and Lucia Specia (eds.) Proceedings of MT Summit XIV Workshop on Post-editing Technology and Practice, Nice, September 2, 2013, p. 11–17. c ©2013 The Authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND. News OpenOffice Client Total de-en 1788 418 50

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adopting New International Health Instruments – What Can We Learn From the FCTC?; Comment on “The Legal Strength of International Health Instruments - What It Brings to Global Health Governance?”

This Commentary forms a response to Nikogosian’s and Kickbusch’s forward-looking perspective about the legal strength of international health instruments. Building on their arguments, in this commentary we consider what we can learn from the Framework Convention on Tobacco Control (FCTC) for the adoption of new legal international health instruments.

متن کامل

What Can We Learn from 21 Years of School Poisonings in New Zealand?

Background: Childhood poisoning is a significant international health concern. Very little is known about trends in exposures within schools and preschools. The objectives of this study were to investigate the data recorded by the New Zealand National Poisons Centre (NPC) on these types of exposures over a 21 year period (1989 to 2009) and to determine trends and propose strategies to reduce th...

متن کامل

What Can We Learn About the Processes of Regulation of Tuberculosis Medicines From the Experiences of Health Policy and System Actors in India, Tanzania, and Zambia?

Background The unregulated availability and irrational use of tuberculosis (TB) medicines is a major issue of public health concern globally. Governments of many low- and middle-income countries (LMICs) have committed to regulating the quality and availability of TB medicines, but with variable success. Regulation of TB medicines remains an intractable challenge in many settings, but the reason...

متن کامل

مدیر موفق کیست؟

Who is a really successful manager? A manager who spends less money, or the one who earns more? A manager who can survive for a longer period of time, or an administrator who expands his organization, and opens up new branches? Which one is the most successful? The article tries to answer these questions and provides, some simple guidlines for the managers in every domain of management who wan...

متن کامل

A Global Social Support System: What the International Community Could Learn From the United States’ National Basketball Association’s Scheme for Redistribution of New Talent

If global trade were fair, it is argued, then international aid would be unnecessary and inequalities inherent to the economic system would be justifiable. Here, we argue that while global trade is unfair, in part because richer countries set the rules, we believe that additional interventions must go beyond trade regulation and short-term aid to redress inequalities among countries that will p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013